{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import cufflinks as cf\n",
    "import plotly as py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 读取全球独角兽企业数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 494 entries, 0 to 493\n",
      "Data columns (total 10 columns):\n",
      " #   Column        Non-Null Count  Dtype \n",
      "---  ------        --------------  ----- \n",
      " 0   排名            494 non-null    int64 \n",
      " 1   企业名称          494 non-null    object\n",
      " 2   Company Name  494 non-null    object\n",
      " 3   估值（亿人民币）      494 non-null    int64 \n",
      " 4   国家            494 non-null    object\n",
      " 5   城市            494 non-null    object\n",
      " 6   行业            494 non-null    object\n",
      " 7   掌门人/创始人       494 non-null    object\n",
      " 8   成立年份          494 non-null    int64 \n",
      " 9   部分投资机构        494 non-null    object\n",
      "dtypes: int64(3), object(7)\n",
      "memory usage: 38.7+ KB\n"
     ]
    }
   ],
   "source": [
    "# 简单读档并查看数据框讯息\n",
    "df = pd.read_csv (\"hurun_unicorn.tsv\", encoding = \"utf8\", sep=\"\\t\")\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 206 entries, 0 to 205\n",
      "Data columns (total 11 columns):\n",
      " #   Column        Non-Null Count  Dtype \n",
      "---  ------        --------------  ----- \n",
      " 0   排名            206 non-null    int64 \n",
      " 1   企业名称          206 non-null    object\n",
      " 2   Company Name  206 non-null    object\n",
      " 3   估值（亿人民币）      206 non-null    int64 \n",
      " 4   国家            206 non-null    object\n",
      " 5   城市            206 non-null    object\n",
      " 6   行业            206 non-null    object\n",
      " 7   掌门人/创始人       206 non-null    object\n",
      " 8   成立年份          206 non-null    int64 \n",
      " 9   部分投资机构        206 non-null    object\n",
      " 10  region        184 non-null    object\n",
      "dtypes: int64(3), object(8)\n",
      "memory usage: 17.8+ KB\n"
     ]
    }
   ],
   "source": [
    "# 简单读档并查看数据框讯息\n",
    "df_中国 = pd.read_csv (\"hurun.csv\", encoding = \"utf8\", sep=\"\\t\")\n",
    "df_中国.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "全球独角兽企业中，估值最高的是金融科技行业，其次是媒体和娱乐行业。云计算、共享经济这类新兴产业发展较好。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      企业名称  估值（亿人民币）         成立年份\n",
      "行业                               \n",
      "电子商务    68     10310  2011.132353\n",
      "金融科技    56     26020  2011.321429\n",
      "云计算     44      8200  2009.977273\n",
      "人工智能    40      5760  2012.175000\n",
      "物流      34      7500  2011.941176\n"
     ]
    }
   ],
   "source": [
    "df_summary = df.groupby(\"行业\").agg({\"企业名称\":\"count\",\"估值（亿人民币）\":\"sum\",\"成立年份\":\"mean\"}).sort_values(by = \"企业名称\",ascending = False )\n",
    "print(df_summary.head(5)) # 在后台检查描述性统计"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "行业情况 = df[['国家','企业名称','排名','行业','掌门人/创始人','估值（亿人民币）']]\\\n",
    ".groupby(['行业','国家'])\\\n",
    ".agg({\"估值（亿人民币）\":\"sum\",'企业名称':'count'})\\\n",
    ".sort_values(['估值（亿人民币）','企业名称'], ascending=False)\\\n",
    ".rename( columns = {'企业名称':'企业数量'} )\\\n",
    ".reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 全球行业情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = df.iplot(kind=\"bar\", x=\"行业\", y=\"估值（亿人民币）\", asFigure=True)\n",
    "py.offline.plot(fig, filename=\"templates/example.html\",auto_open=False)\n",
    "with open(\"templates/example.html\", encoding=\"utf8\", mode=\"r\") as f:\n",
    "    plot_all = \"\".join(f.readlines())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 全球行业情况分析：独全球独角兽企业中，估值最高的是金融科技行业，其次是媒体和娱乐行业。云计算、共享经济这类新兴产业发展较好"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 全球城市情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = df.iplot(kind=\"bar\", x=\"城市\", y=\"估值（亿人民币）\", asFigure=True)\n",
    "py.offline.plot(fig, filename=\"example1.html\",auto_open=False)\n",
    "with open(\"example1.html\", encoding=\"utf8\", mode=\"r\") as f:\n",
    "    plot_all = \"\".join(f.readlines())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 分析结果：中国和美国地区的独角兽企业排名都较前，其中位于中国北京的企业数量最多、估值也最大。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 中国城市情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = df_中国.iplot(kind=\"bar\", x=\"城市\", y=\"估值（亿人民币）\", asFigure=True)\n",
    "py.offline.plot(fig, filename=\"example2.html\",auto_open=False)\n",
    "with open(\"example2.html\", encoding=\"utf8\", mode=\"r\") as f:\n",
    "    plot_all = \"\".join(f.readlines())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 中国的独角兽企业分布在中国18个城市，其中位于北京的企业数量最多、估值也最大。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 中国大湾区情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = df_中国.iplot(kind=\"bar\", x=\"region\", y=\"估值（亿人民币）\", asFigure=True)\n",
    "py.offline.plot(fig, filename=\"example3.html\",auto_open=False)\n",
    "with open(\"example3.html\", encoding=\"utf8\", mode=\"r\") as f:\n",
    "    plot_all = \"\".join(f.readlines())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 分析 \n",
    "1. 由于渤海大湾区有文化中心北京、环杭州湾大湾区有经济中心上海，发展历史悠久，底蕴深厚；且高校人才较多，人才流入。因此，渤海大湾区与环杭州湾大湾区发展较好。\n",
    "2. 粤港澳大湾区是1978年改革开放后经济才快速发展，因此与其他两个大湾区差异较大，估值与企业数量较低。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
